Scientific Python antipatterns advent calendar day nineteen

For today, a fairly simple idea - that the Python standard library has commonly-used values built in, or easily generated. As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read.

If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:

and I’ll send a single email at the end with links to them all.

Manually creating things that are in the standard library

In many of the previous installments, we have looked at variations of the general antipattern of manually recreating tools that are already implemented in Python. What is true for functions is also true for data - many commonly used values are conveniently already available, or can be generated.

Many of these relate to string processing: rather than manually making collections of letters:

lowercase_letters = 'abcdefghijklmnopqrstuvwxyz'
uppercase_letters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

we can use the versions that Python has already defined:

import string
print(string.ascii_lowercase)
print(string.ascii_uppercase)

abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ

as well as various useful combinations:

# all uppercase and lowercase letters
string.ascii_letters

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

# all ascii punctuation
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

# digits
string.digits

'0123456789'

Although we can easily convert these to other data structures if we like:

print(set(string.punctuation))

{'}', ':', '*', '?', '"', "'", '\\', '[', '(', '{', '-', '^', '%', '#', '|', '=', '$', ';', '!', ')', '`', '~', '>', '@', '&', '/', ']', '+', '.', ',', '<', '_'}

it’s often not necessary, as we can iterate over strings easily in Python:

for digit in string.digits:
    print(digit)

A similar idea arises when working with time series data. Rather than writing out the days of the week:

days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

we can use the built in versions:

import calendar
list(calendar.day_name)

['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

which also conveniently contains abbreviated versions:

list(calendar.day_abbr)

['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

For example, we can easily create a useful dictionary that maps short names to long ones:

dict(zip(calendar.day_abbr, calendar.day_name))

{'Mon': 'Monday',
 'Tue': 'Tuesday',
 'Wed': 'Wednesday',
 'Thu': 'Thursday',
 'Fri': 'Friday',
 'Sat': 'Saturday',
 'Sun': 'Sunday'}

The same is true with months:

list(calendar.month_abbr)

['',
 'Jan',
 'Feb',
 'Mar',
 'Apr',
 'May',
 'Jun',
 'Jul',
 'Aug',
 'Sep',
 'Oct',
 'Nov',
 'Dec']

but note that this list helpfully starts with an empty element to make the indices of the months line up with the human representation i.e. January should be 1 rather than 0.

# what is the name of the fifth month?
calendar.month_abbr[5]

'May'

Sometimes it’s tempting to make a list of numbers that follow a particular pattern:

three_times_table = [3,6,9,12]

but this is better handled by range:

list(range(3,15,3))

[3, 6, 9, 12]

as it’s easier for a reader to see the pattern, and to make changes.

Lists of numbers that don’t follow a simple linear pattern like this can be harder to avoid hard-coding:

powers_of_ten = [1, 10, 100, 1000]
squares = [1,4,9,16,25]

but can generally be handled with a list comprehension:

[10 ** x for x in range(4)]

[1, 10, 100, 1000]

[x ** 2 for x in range(1,6)]

[1, 4, 9, 16, 25]

Finally, there might be times where we need to have some mathematically special numbers:

PI = 3.1415

def circle_area(radius):
    return PI * radius ** 2

circle_area(2)

12.566

but these are also available in the standard library:

import math
math.pi, math.e

(3.141592653589793, 2.718281828459045)

One more time; if you want to see the rest of these little write-ups, sign up for the mailing list: